QC-Bench: Quantum Computing Benchmark for LLMs
QC-Bench is a comprehensive benchmark for evaluating quantum computing knowledge in Large Language Models (LLMs). This repository contains datasets and evaluation scripts to assess how well LLMs understand quantum computing concepts across multiple formats, complexity levels, and languages.

Repository Structure
The repository is organized into four main folders:

1. Main
Contains the core benchmark datasets and evaluation scripts for assessing quantum computing knowledge across topics and model types.

Benchmark Datasets:
qc200.json: 200 core multiple-choice questions
qc1000.json: 1,000 comprehensive multiple-choice questions
qc5184.json: 5,184 full benchmark multiple-choice questions
Topic-specific files for seven core areas:
basic_concepts.json
gates_and_circuit_design.json
qml.json (Quantum Machine Learning)
security.json
error_correction.json
algorithms.json
distributed_computing.json
Evaluation Scripts:
QC_Bench_Main_Experiments.py: Main evaluation script for API-based models
QC_Bench_Main_Experiments_local.py: Evaluation script for locally hosted models


2. Multilingual
Evaluates quantum computing knowledge across different languages.

Datasets:
qc200_FR.json: French translation of QC200
qc200_SP.json: Spanish translation of QC200
Evaluation Script:
QC-Bench_Multilingual.py: Script for evaluating models in multiple languages
3. True_and_False
Contains true/false questions and open-ended questions to evaluate performance across different question formats.

Datasets:
qc_tf.json: 416 true/false questions on quantum computing
qc_oe.json: 421 open-ended questions on quantum computing
Evaluation Script:
QC-Bench_True_and_False.py: Script for evaluating models on T/F and open-ended questions

4. Fine_Tuning
Provides resources for fine-tuning smaller models on quantum computing content.

Datasets:
qc4167.json: 4,167 questions for training
qc1000_test.json: 1,000 questions for testing
qc4167_qa.jsonl: Training data in JSONL format
Fine-tuning Script:
QC_Bench_Fine_Tuning.py: Script for fine-tuning models on quantum computing data
Usage
Evaluating Models
Set up your API keys in the respective scripts by replacing the placeholder values:
python
os.environ["OPENAI_API_KEY"] = "Your_OpenAI_API_Key"
os.environ["ANTHROPIC_API_KEY"] = "Your_Anthropic_API_Key"
os.environ["GROQ_API_KEY"] = "Your_Groq_API_Key"
os.environ["GOOGLE_API_KEY"] = "Your_Google_API_Key"
Run the evaluation scripts:
bash
# For main evaluation
python Main/QC_Bench_Main_Experiments.py

# For multilingual evaluation
python Multilingual/QC-Bench_Multilingual.py

# For true/false and open-ended evaluation
python True_and_False/QC-Bench_True_and_False.py

# For fine-tuning
python Fine_Tuning/QC_Bench_Fine_Tuning.py



Total Questions: 6,021
Multiple-choice: 5,184
True/False: 416
Open-ended: 421
Topics Covered:
Basic Concepts
Gates & Circuit Design
Quantum Machine Learning (QML)
Quantum Security
Error Correction
Quantum Algorithms
Distributed Computing
Paper
This benchmark is described in detail in our paper "QC-Bench: What Do LLMs Really Know About Quantum Computing?". Please cite our work if you use this benchmark in your research.

License
This dataset is provided under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

